Principal components analysis to summarize microarray experiments: application to sporulation time series.

نویسندگان

  • S Raychaudhuri
  • J M Stuart
  • R B Altman
چکیده

A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. It is often not clear whether a set of experiments are measuring fundamentally different gene expression states or are measuring similar states created through different mechanisms. It is useful, therefore, to define a core set of independent features for the expression states that allow them to be compared directly. Principal components analysis (PCA) is a statistical technique for determining the key variables in a multidimensional data set that explain the differences in the observations, and can be used to simplify the analysis and visualization of multidimensional data sets. We show that application of PCA to expression data (where the experimental conditions are the variables, and the gene expression measurements are the observations) allows us to summarize the ways in which gene responses vary under different conditions. Examination of the components also provides insight into the underlying factors that are measured in the experiments. We applied PCA to the publicly released yeast sporulation data set (Chu et al. 1998). In that work, 7 different measurements of gene expression were made over time. PCA on the time-points suggests that much of the observed variability in the experiment can be summarized in just 2 components--i.e. 2 variables capture most of the information. These components appear to represent (1) overall induction level and (2) change in induction level over time. We also examined the clusters proposed in the original paper, and show how they are manifested in principal component space. Our results are available on the internet at http:¿www.smi.stanford.edu/project/helix/PCArray .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of the Discrete Cosine Transform for Gene Expression Data Analysis

Analysis of microarray gene expression data using signal processing and statistical techniques has received considerable interest in recent years, however the problem of organizing and visualizing these large data sets is still a pressing issue. This paper proposes the use of the discrete cosine transformation (DCT) to reduce the large number of dimensions of the microarray data, thereby simpli...

متن کامل

Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data

MOTIVATION Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algo...

متن کامل

BIOINFORMATICS Collateral Missing Value Imputation: A New Robust Missing Value Estimation Algorithm For Microarray Data

Motivation: Microarray data is used in a range of application areas in biology, though often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible prior to using these algorithms. While many imputation algo...

متن کامل

Principal Component Analysis using Singular Value Decomposition of Microarray Data

A series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. Principal component analysis(PCA) has been widely used in multivariate data analysis to reduce the dimensionality of the data in order to simplify subsequent analysis and allow for summarization of the data in a parsimonious manner. PCA, which can be implemented...

متن کامل

Dynamical Analysis of Circadian Gene Expression

Microarrays technique allows the simultaneous measurements of the expression levels of thousands of mRNAs. By mining this data one can identify the dynamics of the gene expression time series. By recourse of principal component analysis, we uncover the circadian rhythmic patterns underlying the gene expression profiles from Cyanobacterium Synechocystis. We applied PCA to reduce the dimensionali...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 2000